CWI at INEX 2002

نویسندگان

  • Johan A. List
  • Arjen P. de Vries
چکیده

This paper describes our participation in INEX 2002 (the XML Retrieval Initiative) and discusses several aspects of our XML retrieval system: the retrieval model, the document indexing and manipulation scheme and our preliminary evaluation results of the submitted three runs. In our system, we have used a probabilistic retrieval model where we map (structural) properties of documents to dimensions of relevance and use these dimensions of relevance for retrieval purposes. The study concentrates on coverage, defined as the amount of relevant information present in a document component. We also discuss an efficient and database-independent indexing scheme for XML documents, based on text regions and discuss region operators for selection and manipulation of XML document regions. Initial evaluation of our results, with a rather adhoc approach, made clear that evaluation measures for structured document retrieval needs more discussion and research.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient XML and Entity Retrieval with PF/Tijah: CWI and University of Twente at INEX'08

PF/Tijah is a research prototype created by the University of Twente and CWI Amsterdam with the goal to create a flexible environment for setting up search systems. By integrating the PathFinder (PF) XQuery system [1] with the Tijah XML information retrieval system [2] it combines database and information retrieval technology. The PF/Tijah system is part of the open source release of MonetDB/XQ...

متن کامل

Report of the INEX 2003 Metrics working group

This paper summarises the discussions of the metrics working group at the INEX 2003 Workshop, Dagstuhl, Dec 15-17 2003. Members of the group were Djoerd Hiemstra (U. of Twente), Jaap Kamps (ILLC, U. of Amsterdam), Gabriella Kazai (Queen Mary U. of London), Yosi Mass (IBM Haifa), Vojkan Mihajlovic (U. of Twente), Paul Ogilvie (Carnegie Mellon U.), Jovan Pehcevski (RMIT U.), Arjen de Vries (CWI) ...

متن کامل

The University of Amsterdam at INEX–2002

This document describes the runs for the INEX–2002 task submitted by the Language and Inference Technology Group at the University of Amsterdam. Besides a description of our experiments some logical problems with the INEX format of the content and structure topics are discussed and an alternative is proposed.

متن کامل

Proceedings of the Fifth Dutch - Belgian Information Retrieval

Todays content is increasingly a mixture of text, multimedia, and metadata. One way to format this mixed content is according to the adopted W3C standard for information repositories, the so-called eXtensible Markup Language (XML). The increasing use of XML in scientific data repositories, Digital Libraries and on the Web, has brought about an explosion in the development of XML tools, and in p...

متن کامل

Overview of the Initiative for the Evaluation of XML retrieval (INEX) 2002

The INitiative for the Evaluation of XML retrieval (INEX) aims at providing an infrastructure for evaluating the effectiveness of content-oriented XML retrieval. In the first round of INEX, in 2002, a test collection of real world XML documents along with standard topics and respective relevance assessments has been created. Research groups from 36 different organisations participated in this c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002